Geographic Reference Analysis For Geographic Document Querying

نویسندگان

  • Frederik Bilhaut
  • Thierry Charnois
  • Patrice Enjalbert
  • Yann Mathet
چکیده

The work presented in this paper concerns Information Retrieval from geographical documents, i.e. documents with a major geographic component. The final aim, in response to an informational query of the user, is to return a ranked list of relevant passages in selected documents, allowing text browsing within them. We consider in this paper the spatial component of the texts and the queries. The idea is to perform an off-line linguistic analysis of the document, extracting spatial expressions (i.e. expressions denoting geographical localisations). The point is that such expressions are (in general) much more complex than simple place names. We present a linguistic analyser which recognises them, performing a semantic analysis and computing symbolic representations of their "content". These representations, stored in the text thanks to XML annotation, will act as indexes of passages with which queries are compared. The matching of queries with text expressions is a complex process, needing several kinds of numeric and symbolic computations. A prospective outline of it is described. 1 Presentation of the GeoSem project. Passage extraction from geographical document The work presented in this paper concerns Information Retrieval (IR) from geographical documents, i.e. documents with a major geographic component. Let’s precise at once that we are mainly interested in human geography, where the phenomena under consideration are of social or economic nature. Such documents are massively produced and consumed by academics as well as state organisations, marketing services of private companies and so on. The final aim is, in response to an informational query of the user, to return not only a set of documents (taken as wholes) from the available collection of documents, but also a list of relevant passages allowing text browsing within them. Geographical information is spatialised information, information so to speak anchored in a geographical space. This characteristic is immediately visible on geographical documents, which describe how some phenomena (often quantified, either in a numeric or qualitative manner) are related with a spatial and also, often, temporal localisation. Figure 1 gives an example of this informational structure, extracted from our favourite corpus (Hérin, 1994), relative to the educational system in France. As a consequence a natural way to query documents will be through a 3-dimensional topic, Phenomenon-Space-Time as shown in Figure 2. The goal is to select passages that fulfil the whole bunch of criteria and to return them to the user in relevance order. The system we designed and currently develop for that purpose is divided in two tasks: an off-line one, devoted to linguistic analysis of the text, and an online one concerning querying itself. Let’s give an overall view of the process, focusing on the spatial dimension of texts and analysis. Other aspects of the project, including especially the analysis of expressions denoting phenomena, techniques used to link the three components of information (Space, Time, Phenomena) and implementation issues can be found in (Bilhaut, 2003). Concerning text analysis, the goal is to locate, extract and analyse the expressions which refer to some geographical localisation 1 so that they act as indexes of text passages. The first remark to do is that we have to cope (in general) with complex nominal expressions, not only named geographical entities, as exemplified in figure 3. Indeed the collection of (proper) place names can Temporal expressions (expressing temporal localisation) are treated in a similar manner. Edmonton, May-June 2003 HLT-NAACL 2003 Workshop: Analysis of Geographic References , pp. 55-62

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GIR Experimentation

Geographic Information Retrieval (GIR) community has generally accepted the thesis that both thematic and geographic aspect of documents are useful for GIR. This paper describes a preliminary experiment exploring this thesis by seperately index-ing/searching geographical relevant-terms (place names, geo-spatial relations, geographic concepts and geographic adjacetives) extracted from reference ...

متن کامل

Validation of Volunteered Geographic Information Landuse Change Using Satellite Imagery

Land use change monitoring is one of the main concerns of managers and urban planners due to human activities and unbalanced physical development in urban areas. In this paper, a combination of remote sensing data and volunteered geographic information was used to assess the quality of volunteered geographic information on land use and land cover changes monitoring. For this purpose, the ORBVIE...

متن کامل

Mapgets: A Tool for Visualizing and Querying Geographic Information

Although geographic applications vary widely, their user interfaces have many requirements in common because of the spatial component of their data. Geographic data is not standard data, and appropriate tools are required (i) for editing (i.e., displaying and modifying) and (ii) for querying it. In this paper, we rstly study the major aspects of visualizing and querying geographic information w...

متن کامل

Multimodal Indexation of Contrastive Structures in Geographical Documents

This paper deals with indexation of multi-modal geographic documents by the mean of two constructs: geographic entities, and their semantic relations. These relations concern more specifically contrast or similarity between the entities with regard to the described phenomenon. Geographic entities are retrieved in both text and maps using proper analysis techniques. Contrast or similarity relati...

متن کامل

A Digital GeoLibrary: Integrating Keywords and Place Names

A digital library typically includes a set of keywords (or subject terms) for each document in its collection(s). For some applications, including natural resource management, geographic location (e.g., the place of a study or a project) is very important. The metadata for such documents needs to indicate the location(s) associated with a document and users need to be able to search for documen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003